Skip to content

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Oct 1, 2025

Which issue does this PR close?

Rationale for this change

One of the primary reasons the GeoParquet community was excited about first-class Parquet Geometry/Geography support was the built-in column chunk statistics (we had a workaround that involved adding a struct column, but it was difficult for non-spatial readers to use it and very difficult for non-spatial writers to write it). This PR ensures it is possible for arrow-rs to write files that include those statistics.

What changes are included in this PR?

This PR inserts the minimum required change to enable this support.

Are these changes tested?

Yes!

Are there any user-facing changes?

There are several new functions (which include documentation). Previously it was difficult or impossible to actually write Geometry or Geography logical types, and so it is unlikely any previous usage would be affected.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Oct 1, 2025
Copy link
Member Author

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works!

@alamb @etseidl I'm aware this would need some tests/improved documentation at a lower level; however, I'd love some feedback on the approach before I go through and clean this up more thoroughly (whenever time allows!)

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @paleolimbot, looks pretty good on a first pass. I just want to make sure that the size statistics are written properly when geo stats are enabled.

Comment on lines 166 to 168
if let Some(var_bytes) = T::T::variable_length_bytes(slice) {
*self.variable_length_bytes.get_or_insert(0) += var_bytes;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should execute regardless of whether geo stats are enabled. The variable_length_bytes are ultimately written to the SizeStatistics which are useful even without min/max statistics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

drop(file_writer);

// Check that statistics exist in thrift output
thrift_metadata.row_groups[0].columns[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up that when the thrift stuff merges this will no longer be a format::FileMetaData but file::metadata::ParquetMetaData.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I removed these assertions so that they won't break when the thrift stuff merges (although there will be a few logical type constructors that will need to be updated).

@paleolimbot
Copy link
Member Author

Thank you for the review! I will clean this up on Monday and add a few more tests.

@etseidl
Copy link
Contributor

etseidl commented Oct 8, 2025

@paleolimbot I took a stab at resolving the merge conflicts. They are mostly trivial, but I wasn't sure how to resolve the tests. I'll leave that up to you 😄.

@alamb
Copy link
Contributor

alamb commented Oct 8, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing spatial-stats-write (f9112f7) to d5df352 diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_writer
BENCH_FILTER=
BENCH_BRANCH_NAME=spatial-stats-write
Results will be posted here when complete

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @paleolimbot and @etseidl

I reviewed this PR for test coverage and structure, and from my perspective it is good to go. I had a few minor comments / suggestions, but nothing I think would prevent merging

}
}

/// Explicitly specify the Parquet schema to be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice API addition I think

/// ```
#[derive(Clone, Debug, PartialEq, Default)]
pub struct GeospatialStatistics {
/// Optional bounding defining the spatial extent, where None represents a lack of information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why remove these comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved them to the accessor methods in a previous change...I'm not sure why they're showing up in this diff. My theory was that they'd be more likely to be read there but I don't mind copying them back.

fn new_accumulator(&self, descr: &ColumnDescPtr) -> Box<dyn GeoStatsAccumulator>;
}

/// Dynamic [`GeospatialStatistics``] accumulator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice API for optional statistics encoding

# Enable parquet variant support
variant_experimental = ["arrow", "parquet-variant", "parquet-variant-json", "parquet-variant-compute"]
# Enable geospatial support
geospatial = ["parquet-geospatial"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also add the new feature flag to the main crate readme as well?

https://github.com/apache/arrow-rs/blob/main/parquet/README.md#feature-flags

@alamb
Copy link
Contributor

alamb commented Oct 8, 2025

🤖: Benchmark completed

Details

group                                     main                                   spatial-stats-write
-----                                     ----                                   -------------------
bool/bloom_filter                         1.00    129.9±0.83µs     8.2 MB/sec    1.01    130.8±0.41µs     8.1 MB/sec
bool/default                              1.00     53.2±0.18µs    19.9 MB/sec    1.03     54.8±0.15µs    19.4 MB/sec
bool/parquet_2                            1.00     68.2±0.15µs    15.6 MB/sec    1.05     71.5±0.19µs    14.8 MB/sec
bool/zstd                                 1.00     64.0±0.25µs    16.6 MB/sec    1.02     65.2±0.32µs    16.3 MB/sec
bool/zstd_parquet_2                       1.00     78.6±0.36µs    13.5 MB/sec    1.04     82.0±0.57µs    12.9 MB/sec
bool_non_null/bloom_filter                1.01    106.1±0.39µs     5.4 MB/sec    1.00    105.3±0.60µs     5.4 MB/sec
bool_non_null/default                     1.00     19.8±0.06µs    28.8 MB/sec    1.00     19.9±0.38µs    28.7 MB/sec
bool_non_null/parquet_2                   1.00     37.8±0.49µs    15.1 MB/sec    1.01     38.1±0.13µs    15.0 MB/sec
bool_non_null/zstd                        1.01     28.6±0.18µs    20.0 MB/sec    1.00     28.4±0.14µs    20.1 MB/sec
bool_non_null/zstd_parquet_2              1.00     47.2±0.32µs    12.1 MB/sec    1.01     47.9±0.32µs    12.0 MB/sec
float_with_nans/bloom_filter              1.00   940.8±12.22µs    58.4 MB/sec    1.00    943.2±6.94µs    58.3 MB/sec
float_with_nans/default                   1.00    572.4±2.04µs    96.0 MB/sec    1.00    573.9±4.01µs    95.8 MB/sec
float_with_nans/parquet_2                 1.00    825.7±3.25µs    66.6 MB/sec    1.00    825.7±3.48µs    66.6 MB/sec
float_with_nans/zstd                      1.01    752.6±8.88µs    73.0 MB/sec    1.00    746.3±1.57µs    73.6 MB/sec
float_with_nans/zstd_parquet_2            1.00  1005.0±13.88µs    54.7 MB/sec    1.00   1001.6±4.21µs    54.9 MB/sec
list_primitive/bloom_filter               1.00      4.0±0.06ms   535.5 MB/sec    1.13      4.5±0.17ms   472.7 MB/sec
list_primitive/default                    1.00  1745.4±14.39µs  1221.4 MB/sec    1.06  1846.9±15.71µs  1154.3 MB/sec
list_primitive/parquet_2                  1.00      2.3±0.01ms   908.3 MB/sec    1.14      2.7±0.04ms   794.4 MB/sec
list_primitive/zstd                       1.00      4.2±0.04ms   511.7 MB/sec    1.10      4.6±0.09ms   464.3 MB/sec
list_primitive/zstd_parquet_2             1.00      4.2±0.04ms   510.1 MB/sec    1.04      4.4±0.11ms   489.8 MB/sec
list_primitive_non_null/bloom_filter      1.00      4.8±0.08ms   438.9 MB/sec    1.00      4.9±0.12ms   438.3 MB/sec
list_primitive_non_null/default           1.00  1836.3±11.47µs  1158.5 MB/sec    1.01  1848.6±12.75µs  1150.8 MB/sec
list_primitive_non_null/parquet_2         1.12      3.3±0.04ms   639.9 MB/sec    1.00      3.0±0.02ms   718.3 MB/sec
list_primitive_non_null/zstd              1.00      5.5±0.05ms   385.9 MB/sec    1.00      5.5±0.04ms   387.7 MB/sec
list_primitive_non_null/zstd_parquet_2    1.00      5.9±0.07ms   360.4 MB/sec    1.00      5.9±0.07ms   361.7 MB/sec
primitive/bloom_filter                    1.00      4.3±0.11ms    41.1 MB/sec    1.01      4.3±0.11ms    40.9 MB/sec
primitive/default                         1.00    849.6±2.82µs   207.1 MB/sec    1.03    875.5±7.96µs   200.9 MB/sec
primitive/parquet_2                       1.00   1008.6±5.47µs   174.4 MB/sec    1.02   1033.8±4.40µs   170.2 MB/sec
primitive/zstd                            1.00   1152.6±7.40µs   152.6 MB/sec    1.02   1172.5±6.20µs   150.0 MB/sec
primitive/zstd_parquet_2                  1.00   1343.2±5.55µs   131.0 MB/sec    1.02   1370.7±7.62µs   128.4 MB/sec
primitive_non_null/bloom_filter           1.00      4.3±0.15ms    39.9 MB/sec    1.01      4.4±0.17ms    39.7 MB/sec
primitive_non_null/default                1.00    720.4±3.33µs   239.5 MB/sec    1.01    724.3±3.90µs   238.2 MB/sec
primitive_non_null/parquet_2              1.00   867.1±35.25µs   199.0 MB/sec    1.02    888.2±6.67µs   194.2 MB/sec
primitive_non_null/zstd                   1.00   1000.5±6.52µs   172.4 MB/sec    1.01   1007.3±4.85µs   171.3 MB/sec
primitive_non_null/zstd_parquet_2         1.00   1301.0±7.07µs   132.6 MB/sec    1.01   1311.5±6.67µs   131.5 MB/sec
string/bloom_filter                       1.01      2.4±0.02ms   837.8 MB/sec    1.00      2.4±0.04ms   845.3 MB/sec
string/default                            1.01    776.7±4.10µs     2.6 GB/sec    1.00    772.2±7.42µs     2.6 GB/sec
string/parquet_2                          1.00   1306.4±9.46µs  1567.7 MB/sec    1.00  1308.1±14.37µs  1565.7 MB/sec
string/zstd                               1.00      3.4±0.03ms   594.3 MB/sec    1.00      3.4±0.02ms   596.1 MB/sec
string/zstd_parquet_2                     1.01      3.7±0.04ms   550.3 MB/sec    1.00      3.7±0.03ms   554.7 MB/sec
string_and_binary_view/bloom_filter       1.00    591.3±5.68µs   213.4 MB/sec    1.02    600.4±6.46µs   210.2 MB/sec
string_and_binary_view/default            1.00    351.0±1.94µs   359.5 MB/sec    1.02    356.6±0.91µs   353.9 MB/sec
string_and_binary_view/parquet_2          1.00    383.7±1.66µs   328.9 MB/sec    1.04    398.3±2.63µs   316.9 MB/sec
string_and_binary_view/zstd               1.00    605.6±2.30µs   208.4 MB/sec    1.01    610.5±2.72µs   206.7 MB/sec
string_and_binary_view/zstd_parquet_2     1.00    743.1±2.49µs   169.8 MB/sec    1.00    742.3±4.29µs   170.0 MB/sec
string_dictionary/bloom_filter            1.01    624.0±2.95µs  1653.9 MB/sec    1.00    620.2±6.60µs  1663.9 MB/sec
string_dictionary/default                 1.02    395.7±3.05µs     2.5 GB/sec    1.00    388.6±2.24µs     2.6 GB/sec
string_dictionary/parquet_2               1.01    392.9±1.34µs     2.6 GB/sec    1.00    387.9±3.10µs     2.6 GB/sec
string_dictionary/zstd                    1.01   1163.9±7.26µs   886.7 MB/sec    1.00   1156.6±3.29µs   892.3 MB/sec
string_dictionary/zstd_parquet_2          1.01  1974.5±16.35µs   522.7 MB/sec    1.00  1963.2±18.26µs   525.7 MB/sec
string_non_null/bloom_filter              1.00      3.1±0.04ms   651.7 MB/sec    1.00      3.1±0.05ms   653.9 MB/sec
string_non_null/default                   1.01  1139.8±10.58µs  1796.0 MB/sec    1.00  1127.9±13.53µs  1815.0 MB/sec
string_non_null/parquet_2                 1.02  1885.6±19.25µs  1085.6 MB/sec    1.00  1855.6±19.42µs  1103.2 MB/sec
string_non_null/zstd                      1.00      3.2±0.03ms   633.9 MB/sec    1.00      3.2±0.03ms   636.9 MB/sec
string_non_null/zstd_parquet_2            1.02      5.0±0.09ms   407.2 MB/sec    1.00      4.9±0.06ms   415.1 MB/sec

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @paleolimbot.

One question I have has to deal with the column chunk Statistics and the column index. Am I correct that if geo stats are written, the column chunk stats should be None? And should the column index for such a column also be None? If so, could you add a test that verifies this? 🙏 Could be in a later PR.

@paleolimbot
Copy link
Member Author

Am I correct that if geo stats are written, the column chunk stats should be None?

The min/max value should be absent but the null count should still be there. I added a test!

And should the column index for such a column also be None?

I actually have no idea what a column index, which suggests to me that it should be None 🙂

@etseidl
Copy link
Contributor

etseidl commented Oct 9, 2025

And should the column index for such a column also be None?

I actually have no idea what a column index, which suggests to me that it should be None 🙂

It's a version of the page statistics available without having to parse the individual page headers. It has the unfortunate(*) property that min and max are mandatory, so if either min or max are None (as is this case here), the column index should not be written (which from the test you added seems to be the case).

(*) Unfortunate because there is other information in the column index beyond min and max statistics that can still be of use for page pruning. Null pages and level histograms among them.

@etseidl
Copy link
Contributor

etseidl commented Oct 9, 2025

I think this is ready to merge. @alamb have your concerns been addressed?

Comment on lines +21 to +28
pub fn wkb_point_xy(x: f64, y: f64) -> Vec<u8> {
let mut item: [u8; 21] = [0; 21];
item[0] = 0x01;
item[1] = 0x01;
item[5..13].copy_from_slice(x.to_le_bytes().as_slice());
item[13..21].copy_from_slice(y.to_le_bytes().as_slice());
item.to_vec()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a huge deal for XY and XYZM points, but if we want more complex helpers for more complex geometries, I think it would be more maintainable and more understandable for future people to use an existing crate to generate the WKB buffers. (In my own projects I use wkt::types as simple extended-dimension geometry types that I pass into wkb APIs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! The GeometryBounder is tested with those here (where we have wkt as a dev dependency). I don't mind how these are implemented (I just needed something for the parquet/geospatial tests).

@alamb
Copy link
Contributor

alamb commented Oct 9, 2025

The only thing I want to make sure is that this doesn't impact writing performance. The benchmark results above seem to suggest there might be.

list_primitive/parquet_2                  1.00      2.3±0.01ms   908.3 MB/sec    1.14      2.7±0.04ms   794.4 MB/sec

However, I tried to reproduce this locally and it looks fine to me.

cargo bench --bench arrow_writer -- "list_primitive/parquet_2"
list_primitive/parquet_2
                        time:   [994.65 µs 1.0011 ms 1.0083 ms]
                        thrpt:  [2.0649 GiB/s 2.0797 GiB/s 2.0931 GiB/s]
                 change:
                        time:   [+0.0649% +2.0194% +3.9738%] (p = 0.04 < 0.05)
                        thrpt:  [-3.8219% -1.9795% -0.0649%]
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

@alamb
Copy link
Contributor

alamb commented Oct 9, 2025

Three approvals so let's get this one in and we can iterate if necessary in follow on PRs!

@alamb alamb merged commit 56e9c86 into apache:main Oct 9, 2025
20 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 9, 2025

Thanks again @paleolimbot @etseidl and @kylebarron !

@alamb
Copy link
Contributor

alamb commented Oct 9, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing spatial-stats-write (d5ba2f2) to d5df352 diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_writer
BENCH_FILTER=
BENCH_BRANCH_NAME=spatial-stats-write
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 9, 2025

🤖: Benchmark completed

Details

group                                     main                                   spatial-stats-write
-----                                     ----                                   -------------------
bool/bloom_filter                         1.00    130.2±1.02µs     8.1 MB/sec    1.00    129.6±0.87µs     8.2 MB/sec
bool/default                              1.00     53.3±0.87µs    19.9 MB/sec    1.00     53.3±0.19µs    19.9 MB/sec
bool/parquet_2                            1.01     68.1±0.11µs    15.6 MB/sec    1.00     67.3±0.26µs    15.7 MB/sec
bool/zstd                                 1.00     63.7±0.35µs    16.6 MB/sec    1.01     64.1±1.01µs    16.5 MB/sec
bool/zstd_parquet_2                       1.00     78.3±0.35µs    13.5 MB/sec    1.00     78.5±2.85µs    13.5 MB/sec
bool_non_null/bloom_filter                1.01    106.7±0.74µs     5.4 MB/sec    1.00    106.0±0.81µs     5.4 MB/sec
bool_non_null/default                     1.00     19.9±0.05µs    28.8 MB/sec    1.01     20.0±0.10µs    28.6 MB/sec
bool_non_null/parquet_2                   1.01     38.0±0.30µs    15.0 MB/sec    1.00     37.7±0.42µs    15.2 MB/sec
bool_non_null/zstd                        1.00     28.7±0.22µs    19.9 MB/sec    1.00     28.8±0.13µs    19.9 MB/sec
bool_non_null/zstd_parquet_2              1.01     47.7±0.27µs    12.0 MB/sec    1.00     47.4±0.45µs    12.1 MB/sec
float_with_nans/bloom_filter              1.00    934.0±7.35µs    58.8 MB/sec    1.02    955.5±9.75µs    57.5 MB/sec
float_with_nans/default                   1.00    574.3±1.96µs    95.7 MB/sec    1.02    583.6±3.25µs    94.2 MB/sec
float_with_nans/parquet_2                 1.00    823.1±1.78µs    66.8 MB/sec    1.01    834.2±5.07µs    65.9 MB/sec
float_with_nans/zstd                      1.00    752.0±2.24µs    73.1 MB/sec    1.01    761.6±2.10µs    72.2 MB/sec
float_with_nans/zstd_parquet_2            1.00   1004.8±5.51µs    54.7 MB/sec    1.02   1020.0±5.99µs    53.9 MB/sec
list_primitive/bloom_filter               1.65      4.0±0.05ms   536.2 MB/sec    1.00      2.4±0.02ms   885.3 MB/sec
list_primitive/default                    1.00   1722.0±6.50µs  1238.0 MB/sec    1.00  1720.7±11.33µs  1239.0 MB/sec
list_primitive/parquet_2                  1.33      2.3±0.01ms   907.8 MB/sec    1.00  1772.3±11.03µs  1202.9 MB/sec
list_primitive/zstd                       1.39      4.2±0.05ms   510.3 MB/sec    1.00      3.0±0.01ms   709.7 MB/sec
list_primitive/zstd_parquet_2             1.32      4.0±0.05ms   535.5 MB/sec    1.00      3.0±0.01ms   705.7 MB/sec
list_primitive_non_null/bloom_filter      1.69      4.8±0.07ms   444.4 MB/sec    1.00      2.8±0.03ms   749.5 MB/sec
list_primitive_non_null/default           1.00   1809.2±7.55µs  1175.9 MB/sec    1.01  1822.6±11.55µs  1167.2 MB/sec
list_primitive_non_null/parquet_2         1.47      2.9±0.03ms   722.2 MB/sec    1.00      2.0±0.01ms  1058.7 MB/sec
list_primitive_non_null/zstd              1.37      5.5±0.05ms   388.5 MB/sec    1.00      4.0±0.03ms   532.4 MB/sec
list_primitive_non_null/zstd_parquet_2    1.41      5.9±0.06ms   361.9 MB/sec    1.00      4.2±0.04ms   511.3 MB/sec
primitive/bloom_filter                    1.00      4.4±0.13ms    40.3 MB/sec    1.01      4.4±0.16ms    39.8 MB/sec
primitive/default                         1.00    853.9±4.99µs   206.0 MB/sec    1.00    851.2±5.72µs   206.7 MB/sec
primitive/parquet_2                       1.03   1006.2±5.65µs   174.8 MB/sec    1.00   977.0±75.91µs   180.1 MB/sec
primitive/zstd                            1.00   1152.3±3.71µs   152.7 MB/sec    1.01   1158.8±6.75µs   151.8 MB/sec
primitive/zstd_parquet_2                  1.21   1343.7±5.43µs   130.9 MB/sec    1.00   1106.4±5.01µs   159.0 MB/sec
primitive_non_null/bloom_filter           2.57      4.5±0.17ms    38.3 MB/sec    1.00  1755.6±24.83µs    98.3 MB/sec
primitive_non_null/default                1.01    733.4±4.05µs   235.2 MB/sec    1.00    723.8±2.76µs   238.4 MB/sec
primitive_non_null/parquet_2              1.25    905.9±7.05µs   190.4 MB/sec    1.00   723.6±10.32µs   238.4 MB/sec
primitive_non_null/zstd                   1.01  1020.4±10.58µs   169.1 MB/sec    1.00   1007.4±3.47µs   171.2 MB/sec
primitive_non_null/zstd_parquet_2         1.31  1337.5±23.18µs   129.0 MB/sec    1.00   1021.4±6.06µs   168.9 MB/sec
string/bloom_filter                       1.91      2.4±0.04ms   847.6 MB/sec    1.00  1262.1±20.14µs  1622.8 MB/sec
string/default                            1.01    776.0±5.81µs     2.6 GB/sec    1.00    767.8±5.29µs     2.6 GB/sec
string/parquet_2                          1.67   1289.6±9.39µs  1588.2 MB/sec    1.00    772.5±4.04µs     2.6 GB/sec
string/zstd                               1.50      3.5±0.04ms   590.7 MB/sec    1.00      2.3±0.01ms   885.4 MB/sec
string/zstd_parquet_2                     1.62      3.8±0.03ms   544.9 MB/sec    1.00      2.3±0.02ms   882.4 MB/sec
string_and_binary_view/bloom_filter       1.00    593.1±4.89µs   212.8 MB/sec    1.00    595.5±9.04µs   211.9 MB/sec
string_and_binary_view/default            1.00    353.6±4.51µs   356.9 MB/sec    1.00    353.1±1.05µs   357.5 MB/sec
string_and_binary_view/parquet_2          1.09    385.9±4.93µs   327.0 MB/sec    1.00    354.5±0.87µs   356.0 MB/sec
string_and_binary_view/zstd               1.01    605.9±8.72µs   208.3 MB/sec    1.00    600.8±1.41µs   210.1 MB/sec
string_and_binary_view/zstd_parquet_2     1.24    738.1±3.41µs   171.0 MB/sec    1.00    596.4±8.46µs   211.6 MB/sec
string_dictionary/bloom_filter            1.00    615.8±4.99µs  1675.9 MB/sec    1.01    624.1±4.65µs  1653.5 MB/sec
string_dictionary/default                 1.00    386.9±3.68µs     2.6 GB/sec    1.00    385.9±2.11µs     2.6 GB/sec
string_dictionary/parquet_2               1.00    386.3±2.62µs     2.6 GB/sec    1.02    393.9±2.07µs     2.6 GB/sec
string_dictionary/zstd                    1.00  1158.9±13.37µs   890.5 MB/sec    1.00   1158.5±5.78µs   890.8 MB/sec
string_dictionary/zstd_parquet_2          1.71  1969.3±19.72µs   524.1 MB/sec    1.00   1151.0±6.43µs   896.6 MB/sec
string_non_null/bloom_filter              1.74      3.2±0.07ms   648.8 MB/sec    1.00  1809.0±30.46µs  1131.6 MB/sec
string_non_null/default                   1.00  1121.4±17.72µs  1825.5 MB/sec    1.01   1128.9±7.51µs  1813.4 MB/sec
string_non_null/parquet_2                 1.59  1847.4±26.60µs  1108.1 MB/sec    1.00  1165.3±11.89µs  1756.8 MB/sec
string_non_null/zstd                      1.01      3.3±0.05ms   625.2 MB/sec    1.00      3.3±0.02ms   629.3 MB/sec
string_non_null/zstd_parquet_2            1.52      4.9±0.07ms   414.2 MB/sec    1.00      3.3±0.03ms   627.9 MB/sec

@paleolimbot paleolimbot deleted the spatial-stats-write branch October 10, 2025 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support writing GeospatialStatistics in Parquet writer

4 participants